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I. Introduction 



This note*is concerned with ideas and problems 
involved in cross-classification of observations 
on a given population. Most of the note will be 
confined to the discussion of two-dimensional 
cross-classifications. An example of this is the 
two-dimensional cross-classification of a portion 
of a deck (population) of playing cards resulting 
from the classification of the cards according to 
suit and according to whether or not the card is a 
face card. Such a cross-classification would 
consist of a tabulation of the numbers in each of 
the (4 x 2 • 8) possible combinations of suit and 
face-or-no-face characteristics. 

The main objectives of this note are: 

1) to establish a conceptual framework for 
characterization and comparison of cross-classifi- 
cations ; 

2) to discuss existing methods for characteriza- 
tion of cross-classifications; 

3) to propose a new approach and a new method 
for characterizing and making inferences from cross- 
classifications; 

4) to indicate how Markov processes can be 
treated as cross-classifications. 



* The author wishes to thank Stephen Clark, George 
Mayeske, Richard O'Brien, and Frederic Weinfeld for 
suggestions made during the writing of this note. 



II • Terminology 



The word " event" is the conventional probabil- 
istic term used to indicate what is observed; we 
will speak of observations on a given population 
with the assumption that one or more events are 
observed with each observation. The generic term 
"event type" will be used to specify the character- 
istic being classified in one dimension of a cross- 
classification. When we observe two or more events 
in a single observation we are observing the joint 
occurrence of the given events. The number of 
(joint) events so observed is the dimension of the 
cross-classification and is also the number of 
event types in the cross-classification. The suit 
of a card classifies it according to one event type 
while the face-or-no-face characteristic classifies 
it according to another event type. Within each 
event type there are two or more "event classes"; 
these classes are mutually exclusive and exhaustive 9 
i.e., each observation belongs to exactly one event 
class within each event type. Within the suit 
event type the four event classes are club, diamond, 
heart, and spade. See figure 1. 
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What are called "event classes within ®Y® n * ttfP 6 ®" 
here are called "subcategories within attributes 
bv Guttman (1)* p* 258 and "classifications 
ckteriS" by Miod (25? p. 2 ? 4 , and "classes within 
polytomies" by Goodman and Kruskal (3), and "eate- 
ries within variables" by Kendall and Stuart (6), 

tions is given by Kendall and Stuart 55- 

page chapter entitled Categorized Data. The 
present paper deals exclusively with categorized 
data, which are observations which identify 
(event classes within event types) in a jualitati , 
non-numerical * non-ordered manner. For instance, 
the suits and colors of playing cards are event 
types which are based on categorized data. 



Three kinds of probaoilities are distinguished 
in this note* 



a) unconditional (or marginal) probabilities 



b) conditional probabilities 



c) joint probabilities 



We will be assuming that we are observing elements 
of a well-defined basic population and at each ob- _ 
servation two or more events occur. The unconditional 
probability of an event is the probability of the 
occurrence of that event without regard to the _ 
occurrence of any other event. It is the fraction 
of the observations in which that event has 
occurred if observations have been taken on t 
entire basic population. 



The idea of joint probability is the same as 
unconditional probability except that we are con- 
earned with two (or more) events occurring in a 
single observation instead of just one. The joint 
probability of two events is the fraction oi zne 
observations in which both events would occur were 
the whole basic population observed. The ampersand 
will be used to indicate joint occurrences in this 
note* "A&B" means the joint occurrence of events 
A and B and Pr(A&B)" mi ana the joint probability 
of their occurrence. 



< •? 



The idea of conditional probability also 
involves at least two events , say A and B. The 
conditional probability that A occurs given the 
condition that B also occurs means that we are 
evaluating a probability within a population 
restricted by some condition which was not part 
of the original definition of the basic population. 
The occurrence of event A given that event B also 
occurs is Symbolized by ”A|B" and ”Pr(A|B)" means 
the conditional probability of event A given event 
B. The conditional probability of A given B is 
often defined as the ratio of their joint probabil- 
ity to the unconditional probability of Bs 



This means: of all the times that B occurs, Pr(A|B) 
is that fraction of the times wherein A also occurs. 

When we speak of the occurrence of event A, we 
imply the idea of the nonoccurrence of A. In 
other words, in the back of our minds we are 
considering an A-type event which includes two 
event classes: class A and class non-A. Not-A 
will be symbolized by "X" in this note. This 
A-type event is also known as a "binary variable" 
or a variable which can take on two values, A and 
£.* "Event type" is equivalent to the word 
"variable" here, and the language of this note 
could have been based on "qualitative variables" 
instead of "categorized event types." 



* Other words which are used for this kind of 
variable are: counting, indicator, dichotomous, 
two-state, two-point, and zero-one variable or 
distribution. 




III. Statistical Independence 



The standard definition of the statistical 
independence of two events, A and B, is that the 
probability of their occurrence is the 

product of their probabilities: 

Pr-fA&B') - Pr(A) x Pr(B) . 



bility we obtain the definition of independence 
in terms of conditional probability: events A and 
B are independent if and only if 



Pr(A|B) - 



The Venn diagram which is F igure f . ?? u ®2 d 

“ xstfs-e s&iirs&i.. 

of the two events must overlap a certain n 

order to be independent. The amount required for 
their independence is the product of 1 | h | | 

ities of the events involved; £.g. ,if Pr(A)«.6 
and Pr(B)«.8 then in order for A and B to . . .. 

independent the intersection of their 

domains must be Pr(A) x Pr(B), i.e., .6 times .0 



or 



. 48 : 




Figure 2 
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If the amount of overlap la leas than the Product 
reauired for Independence, the occurrence of either 
e venthas 1 a tendency to preclude the occurrence of 
the other. If the two probability domains do not 
overlap, then each event completely precludes the 
occurrence bf the other and they are mutually 
exclusive. On the other hand, if the amount of 
overlap is greater than the amountrequiredo 
independence then the occurrence of either event 
enhancesthe probability of the occurrence of the 
other If there is complete overlap with one 
probability domain covering that of the other.then, 
Sf course, the occurrence of thesecondia completely 
dependent on the occurrence of the first. 

the Venn diagram of Figure 2 into tabular 
form a 2 by 2 joint probability table with marginal 
(unconditional) probabilities is d ^ n d ( adjoint* 

in rows and columns are marginal probabilities. 




Marginal 

probabilities 



{S5“*} (SS“l Ih." W .Snsinnl* “>« ° £ 



the B type) and an entry in any of the four cells 
of the body of the table (a, b, c , or d) f the 
remaining cells are easily deduced. 
if a is .5 then since a+b«.6, b«.l and since a+c •o, 

c».3 and finally, since c+d-.4, d-.l. 

that if a is .5, A and B are not independent; they 

enhance each other* s occurrence* 



The tabular representation has the advantages 
of explicitly identifying K and B * 

and A in a uniform manner. The tabular form also 
lends itself more easily to event types which are 
classified into several (not just two) classes. 
Suppose that the type A event has three glasses 
A , A 2 and A Q where A Q are events which belong to 

neither A x nor A g . Similarly, suppose the type 

B events are categorized into classes B^, Bg and B^ 



The Venn diagram of this with some probabilities 
put in as an example is Figure 4. 




The same joint probabilities information in 
tabular form appears in Figure 




Consider the independence-dependence character- 
istics of the nine join* events whose probabilities 
are in the body of the table. We make the following 
four observations: 

1) A q is independent of all the 

2) B 2 is independent of all the 

3) enhances the probability of B 1 while 
diminishing that of 

4) A 2 enhances the probability of B Q while 
diminishing that of B^. 

This example shows that instances of dependence 
can be scattered about among various joint events 
when the event types each include several classes. 

In this example with three classes in each of 
two event types , there are (3-l)(3-l)*^ 
freedom in the determination of the probabilities. 
In other words, given the marginal probabilities, 
knowledge of certain sets of four of the nine 
joint probabilities in the body of the table 
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determines ell of them* These sets of probabilities 
are sets of four such that (after entering the set) 
at least one cell in each vow and column remains 
empty and none of the remaining probabilities in 
the body of the table can be determined immediately 
from both its row and its column* Another way of 
stating the second requirement is that if an entry 
can be determined from the other entries in its 
row, it must not be possible to determine it from 
the other entries in its column* This can be 
generalized to m event types with n^, ng* • • • » m 

classes, respectively* Then, given the marginal 
probabilities, there are (n^-lKiig-l) • • • ^m" 1 ' 

degrees of freedom in determining the joint pro- 
babilities* 

Testing for independence in a contingency table 
is a classical statistical problem discussed in ^ 
many statistical books (see, for instance, Mood (2), 
pp. 273-81)* A contingency table tabulates a set of 
observations according to two event types (criteria)* 
Consider, for example, the classification of a group 
of people according to the two event types of 
vision and weight. The vision of each person in 
the group belongs to one of the three classes; 
near-sighted, normal sighted, and far-sighted; and 
the weight of each belongs to one of the three 
classes; underweight, normal weight, and overweight. 
The contingency table for these two event types 
would be a 3 x 3 table indicating the number of 
observations in each of the nine possible combina- 
tions of vision and weight* 

Testing the independence of the two event types 
in a contingency table can be done in the following 
three steps* 

1) Change the number in each of the cells to the 
fraction it is of the total number of obser- 
vations, i.e. , divide each cell number by the 
total number* 










1 wwvj 



I 

I 
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2) Enter the row and column sums of these 

t'h 

fractions as marginals* (The ij~ cell now 
is an estimate of the probability that the 

i£& r ow event and the j— column event 
occur togehter (jointly)* If the two events 
are independent , another estimate of their 
joint probability is the product of the 

row marginal and the j— column marginal*) 



3) Let be the observed fraction of the total 
in the ij~ cell and be the expected 
fraction assuming independence , i.e., the 

i,U iiL “ ** 

product of the i~ row and j— column 
marginals; then compute the test statistic 
for the (n x m) contingency table, 



n 



z 



m 



E 



(0, ,-E^ 

id 




e 



Under the assumption of independence this 
statistic is approximately chi-square dis- 
tributed with (m-l)(n-l) degrees of freedom. 
The approximation to chi-square improves with 
larger numbers of observations in the cells* 




t 
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IV. Measures of Association 

The test for independence of two types of events 
can be a basis for deciding whether or not the two 
event types are independent; if the test leads one 
to reject the independence hypothesis, one may want 
a more complete explanation of the non— independence • 
One way to investigate non— independence is by 
employing a "measure of association” which, in the 
statistical literature, has meant a measure of the 
direction and amount of departure from independence 
of a given cross— classification. Goodman and 
Kruskal have written three extensive (32, 40, and 
54 pages) papers on measures of association ( 3 )» (4) , 
(5). They favor a measure originally proposed by 
Guttman in (1), but they feel that the uses of 
measures of association are varied enough so that 
there should be several from which to choose. 

None of the studies seen by the author, however, 
considers analysis of departures from independence 
in detail within a cross-classification (i.e., in 
the body of a Joint probability table). This is to 
say that the measures of association thus far 
developed are meant to show (dependence) relation- 
ships between event types and not those between the 
event classes of one event type and those of another 
event type. 

If we were working with numerical variables 
instead of categorized event types and we assumed 
that the variables were linearly related, we would 
probably consider the correlation coefficient as 
the first candidate for measuring their association. 
A set of three characteristics of the correlation 
coefficient traditionally has been sought in the 
appraisal of measures of association for categorized 

data: 

1) the measure is zero when the event types 
are independent; 

2) the measure is minus one when they have 
the maximum disassociation; 

3) the measure is plus one when they have the 
maximum association. 
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We will consider a measure of association *j; n 

I?s immediate application, will be appli* sd on: Ly to 
a 2 x 2 cross classification although in i . 

SltiSate application it will appear in the assessment 
of General n x m cross classifications. In addi- 
•n nn to the above set of characteristics we shall 
require our Secure to have another set of character- 
istics related to the correlation coefficient. inla 
set ha« to do with the symmetry of the measure 
between events (classes) and their complements. 
Soecificllly. if Z is the value of a measure of 
the association between events A and B, then it 
mult also be the value between K and S. Alao.the 
value of the measure between A and B and between 

K and B must be -Z* 
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V. Tha Z-measure of Associa tion 

a w tn now be shown that develops a 

meatu“ of association conforming to the requirements 

iKf%“sDr1i bs^g-J.W r* 

table as in Figure 6. 

This is done by letting 





B 


5 
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b 


y 


K 
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d 


i-y 




X 


1— X 





Figure 6 



a * the cell being studied 

b • sum of the remainder of 
the row entries 

c * sum of the remainder of 
the column entries and 

d * sum of the remainder of 
the table (« l-a-b-c); 

X and y are marginal probabilities in both the n x m 

A +JL 9 v p tables: x ** a+c and y - a+b. All . 

and the 2x2 uaoues, o v p table is contained 

the information about the a, x 2 table 

in the three quantities, a, x, and y. 

A measure of a ssociation . betwee 5 # .?!I? < m5t nar The 

variables is their cor ^i®^® n e ®||ffciSnt fo^tSo 
definition of the correlation coefficient 

variables A and B is: 

E(A B) - B(AMBj_ 

yvCA)V(B) 

where E(A) is the expected value of A and V(A) is 
the variance of A which is j;(a 2 ) _ £(A) . 

We are concerned here with binary variables, 
specifically the J®^ b ^f!y l ^sSribStion. ^n’this 

A type of event on the integers zero and one. 
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Pr(A-or-not-A event type « 0) * Pr(X) • l-Pr(A) 
and 

Pr(A-or-not-A event type - 1) « Pr(A). 

A similar definition of the binary distribution of 
B applies. Then the following equations hold: 

E(AB) » Pr(A&B) « a 

E(A) - E(A 2 ) - Pr(A) - y 

E(B) « E(B 2 ) - Pr(B) - x 

V(A) - y(l-y) 

V(B) - x(l-x). 

This leads to a formula for the correlation coefficient 
associated with the 2x2 table in Figure 6, 



Phi(a,x,y) a - — 

\/y(l-y)x(l-x) 

"Phi coefficient" is the usual name for this statis- 
tic* especially in psychological statistics** It is 
zero when the events are independent (a * xy). It 
takes on the value plus one only when a is at its 
maximum and when x * y* The maximum of a is the 
lesser of x and y. The phi coefficient takes on 
the value minus one only when a is at its minimum 
and when x + y » 1* The minimum of a is zero when 
x + y A l and x + y - 1 when x + y » 1* 

We want a measure of association which always 
takes on the value minus one when the least possible 
association prevails and always takes on the value 
plus one when the greatest possible association 
prevails* Such a measure is obtained by dividing 
the phi coefficient by its maximum possible value 
when positive association prevails and by its mini- 
mum possible value when negative association pre- 
vails* We now define such a Z measure of association 



* For a discussion of the phi-coefficient see Guilford 

(7), PP. 333-36. 
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for four possible situations which cover all the 
possibilities of a 2 x 2 table* 

Situation I. a>xy, x^y, the binary variables are 
positively associated. Z is the ratio of phi 
coefficient to its maximum: 

a-xy / y-xy 

/x(l-x)y(l-y) / v/x( l-x)y(l-y) 



y(l-x) 

Situation II. a < xy and x+y 4 1, the binary variables 
are negatively associated (disassociated); Z is the 
negative of the ratio of phi coefficient to its 
minimum: 



s J*z£L- 
y-xy 



Z(a,x,y) - 



Z(a,x,y) 



-l) a-xy / -xy 

/x (l-x)y(l-y) / /x(l-x)y(l-y) 

a-xy 

xy 



Situation III. a<xy and x+y ^ 1, the binary 
variables are negatively associated (as in II). Z is 
the negative of the ratio of phi coefficient to its 
minimum: 



Z(a,x,y) * (-1) 



a-xy 



x+y-l-xy 



7x(l-x)y(l-y) / 7 x(l-x)y(l-y) 



a-xy . 

1-x-y+xy (l-x)(l-y) 



Situation IV. a « xy, the binary variables are in- 
dependent; Z • Q. 



In situations II and III, the sign is made negative 
to indicate disassociation* 

Horst (8), pp. 238-39i points out that this 
Z-measure is the ratio of two covariances, an 
observed covariance over a maximum (Sit* I) or a 
minimum (Sits* II and III) covariance* He uses it 
as a measure of the homogeneity of test items* 

This is an interesting case in which the event 
classes are not mutually exclusive* 

The 2x2 table which is Figure 6 can be written 
solely in terms of a, x, and y. Such a table 
equivalence is shown in Figure 7* 
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5 
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y-a 


y 


r 

i 

K 


x-a 


1+a 

-x-y 


1-y 




X 


1-x 






Figure 7 

Assuming a>xy and x^y, we now find the Z-measure 
for each of the four cells in the body of the table. 
An arrow (— ►) is used to mean implication* 



Z(a,x,y) « fffifxy 

a > xy —v 1+a-x-y > 1+xy-x-y « (1-x) (1-y). Thus since 

l + a-x-y > (1-x) (1-y) , the computation for the lower 
right cell, d, is done using the Situation I formula. 

Z(d, 1-x, 1-y) * Z(l+a~x«y 5 1-x, 1-y) 



» l+a-x-y-(l-xKl^ l 
(1-x) [l-(l-y)] 




For the upper right cell* b, 

a > xy ->y-a <y(l-x) 

x>y — >1— x < 1-y -*• (1-x) + y<l * 

Using Situation II* 

Z(b, 1-x, y) » 2(y-a, 1-x, y) 

y-a-(l-x)y _ 
* “vTlTxo ^ 



a»xy 

yTfej 



For the lower left cell* c* 



a > xy — ► x-a ^ x(l— y) 

x >y -^>l-x < 1-y -^(1-y) + x>l. 

Using Situation III, 

Z(c, x, 1-y) » Z(x-a, x, 1-y) 

_ x— a— x(l— y) - ™ 

(1-x) l-(l-y) 



a-xy 

yO=x) 



'Thus Z(a, x, y) « Z(d, 1-x, 1-y) ° 

-Z(b, 1-x, y) * -Z(c, x, 1-y) 
or in terms of the binary variables A and jd, 



Z(A,B) » S(X, 5) = -Z(A, B) =« -Z(X,B) • 



tjhis we have a measure of association for a pair of 
binlry variables which has the desired characteristics: 

£ c ^ero when variables are independent (Situation 

IV) 



2 . minus one when variables are as 

disassociated as possible (m bxtuations II 

and in) 



* nlus one when the variables are as completely 
associated as possible (in situation I) 



4. Z(A,B) - Z(X,B) — Z(X,B) 



•Z(A ,B) 
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Z(a,x,y) is a function which maps points in a 
three-dimensional space to points in a one- 
dimensional apace* Figure 8 shows one aspect of 
this mapping: Z as a function of a with x and y 
fixed. 



Z and 0 




The bivariate binary correlation coefficient, 0(a) or 
phi(a ) 5 is plotted as a dotted line* It has the 
same domain (set of arguments) as Z but is linear 
throughout its domain* 



o 
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One should keep in mind that, although the 
Z-measure is related to the correlation coefficient 
it differs from the correlation coefficient in 
fundamental ways. The Z-measure is not linear at 
Z-C except when one or both of the marginals is #5* 

It is a two-part linear function of the correlation 
coefficient such that the coefficients of the 
function are non-linear functions of x and y. The 
Z-raeasure should be judged, however, in its own 
right, and not as a replacement for the correlation 
coefficient which fails to behave for bine ry 
variables as it does for linearly related continuous 
numerical variables. It is hard to find a common 
situation with less favorable conditions for 
establishing linear relations than that of the 
bivariate binary distribution with point-masses of 
probabilities at the four corners of a square. The 
perfect correlation condition there is when all the 
mass is on diagonally opposite corners. As we saw 
earlier, this can occur only when x * y (for perfect 
correlation) or when x + y = 1 (for perfect negative 
correlation). In all other cases the correlation 
coefficient cannot attain 1. 

The advantage of Z over the correlation coefficient 
is that it has the range of values minus one to 
plus one for all x and y combinations. This allows 
one to compare Z measures based on different x and y 
combinations and say that the amount of interaction 
between two event classes is greater or less than 
that between another pair. 

Additional insight into the nature of the Z- 
measure is gained when considering it ffom the set- 
theoretic point of view using the Venn diagram 
approach of Section III. Z-meqsure essentially 
shows the relationship between the amount of ob- 
served overlap and the maximum or minimum potential 
overlap of two probability domains. It may be 
thought of as the attraction or repulsion force 
existing between the probability domains of two 
events. In this sense it is considered to be 
measuring a ^mmetric force such as gravity o £ 
magnetic forces. 
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There have been several discussions of the 
Z-measure in psychological statistics. The Z- 
raeaeure i* usually cabled "0 / 0 max" with little 
mention of "0 / 0 min". Carroll (9), pp. 363-64 
discusses this measure in terms of 2 x 2 tables 
derived from the division of continuous bivariate 
distributions into four parts; observations are 
classified as being above or below a point on the 
scale of one marginal distribution and above or 
below a point on the scale of the other marginal 
distribution. Carroll particularly refers to the 
"tetrachoric correlation coefficient" (which 
assumes a bivariate normal underlying distribution) 
as the ideal one and finds the Z-measure does not 
approximate the tetrachoric correlation coefficient 
very well. He rather emphatically dismisses the 
Z-measure on this basis. Guilford (7), pp. 337-38, 
also warns against the use of the Z-measure as 
an "indicator of intrinsic correlation." In general 
the psychometricians discount the use of the Z- 
measure in lieu of a (Pearson) correlation co- 
efficient. The mathematical treatment of the Z- 
measure, however, needs more development than is 
given in the psychometric literature in order to 
understand enough about the measure to make proper 
use of it. Cureton (10), p. 89, for instance, 
states that Z can be +1 only when x * y * .5 and 
Z can be -1 only when x « 1-y ■ .5; whereas Z * +1 
if and only if x * y a and Z * -1 if and only if 
x » 1-y and a » 0. 

Comparing the joint probability (a) with the 
Z-measure for a given joint event, and generalizing 
over the four situations we see that 

a * xy + c2 

where 

c » min(x,y) - xy if Z ^ 0 

* xy if Z 4 0 and x+y 4 l 

* (l-x)(l-y) if Z ^ 0 and x+y * 1. 

Z is thus seen to show the relationship between the 
joint probability and the associated marginal 
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probabilities, x and y. The nature of c is further 
elaborated in Section VII* 

There are also two conditional probabilities 
associated with a joint event and these show, 
respectively, the relationship between one marginal 
and the joint probability and the other marginal 
and the joint probability* Conditional probabilities 
are used to improve one's ability to predict the 
occurrence or non-occurrence of an event by obtaining 
information about the occurrence or non-occurrence 
of another event* Such a prediction operation 
sometimes tends to lead to the imputation of causality 
between the two events. The Z-measure is symmetric 
in the marginals and thus avoids such an imputation. 

A conditional probability without the appropriate 
unconditional probability does not tell one whether 
the event being conditioned is more or less likely 
when it occur* jointly with the conditioning event. 
Thus the conditional probability by itself is not a 
measure of association whereas the Z-measure is. 
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VI. An Application of the Z-measure 

A contingency table which has often appeared as 
an example in textbooks and articles involves the 
cross-classification of 6800 males according to 
their hair and eye color.* Four classes of hair 
color were observed: fair* bfown* black* and red. 
Three classes of eye color were observed: blue, 
hazel or green, and brown. Figure 9 shows the 
original observations (contingency table); Figure 
10 shows the corresponding joint probabilities 
table; Figure 11 shows the corresponding values of 
the Z-measure of association; Figure 12 shows 
conditional probabilities of hair color given eye 
color and eye color given hair color. 



This cross-classification fails the classical 
chi-square test of independence at the .000001 
level. Some of the details brought out by the 
Z-measure are that red hair is essentially indepen- 
dent of eye color for this population while a 
general correlation of eye and hair pigment holds. 

The disassociation of fair hair f ld c ^owneyes(-.678) 
and of black hair and blue eyes (-.626) are, however, 
much more pronounced than the associations of fair 

hair and blue eyes (+.365) and of b Jf ck . h ® i F r , a ”^ brown 
eves (+.190). The weakness of the black hair and 
brown eye association is surprisingly less than 
that of both black hair and hazel eyes (.277) and 
that of brown hair and brown eyes (.201). 



* E.g. , 



see Goodman and Kruskal (3) • 
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VI XL. T he Reverse Inference: Joint Probabilities From 
Z-measures 

Heretofore we have been considering the derivation 
of the Z-measure from a Joint probability table. We 
now consider the derivation of a joint probability 
table given a set of Z-measures and marginal proba- 
bilities. We may be given the marginal distributions 
of a cross-classification, for instance, and 
(perhaps subjective) estimates of some of the 
associations between various states. We can specify 
no more Z-measures than the degrees of freedom 
involved. The joint probability table and the 
Z-measure table of a three by three cross-classifica- 
tion are shown in Figure 15* 
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Z 31 
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z 33 
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a 23 
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a 31 


a 32 


a 33 


y 3 


X 1 


x 2 


x 3 





Figure 15 

We derive definitions of the a • • from the specifi— 
cations for Z's on page 16. 

a id * x j y i + c ij z iJ 
where c^ » min(xj,y^) - x^y^ if Z * 0 

* Xjy ± if Z 4 0 and x^+y^l 

- (1-XjKl-yp if Z 4 0 and Xj+y^l 
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Theorem: ^ *25 



. is composed of two factors, say f x ? r.u f 2 . 
of the factors, say fj_, is no greater than .5 au 

,5-u, u * 0. Then f 2 ^ .5+u. Suppose (at worst) 

_ ,5+u. Then fjf 2 « (.5-u)(.5+u) “ *25-u 2 6 ‘ 2 5» 

c is the maximum possible difference between 
a and x^ given the information about whether a^ 
is larger or smaller than x^s indicates how 
much of this maximum deviation is attained by j • 

One possibility is that a^ - x^; then for each row 

and each column X a ij * S x d y i* Htwaver ’ the roW “* 
column sums are the same (i.e., the marginal probabilities 

are fixed) for all permissible sets of . Thus for 

any permissible configuration of the a i; j for each row 

and column, ]T a i;j « and 2/ij * 2 (K j y i +c ij Z id ) * 

Therefore, for each row and column ^°ij 3 ij 30, For 

convenience, let d i; j » c ij z ij* We now have the 
following set of six equations defining the relations 

among the d i;} for a 3 x 3 Joint probability tables 



One 



f l ’ 




i 
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1) + d^2 ^ ^X3 * ^ 

2) ^2X ^ ^22 ^ ^23 * ® 

3) d^ + + <^3 * 0 



row sums 



4) d n + d 21 + d ?1 - 0 

5) djg + d 22 + d 32 * ® 

d) d^j + d2^ + d^^ * 0 



column sums 



0 <c i j £ .25 and -1< *• +1 means that -.25 ^ d^ ^ +.25. 



'id 



Earlier the number of degrees of freedom associated 
with a cross-classification table were discussed 
(pp.9,10). If we decide certain sets of four dy, 



the remainder of a 3 



x 3 table of djj's are determined 



There are 126 ways of selecting four cells from among 
nine; only 81 of these ways conform to the degrees 
of freedom requirements stated earlier. Each of 
those 81 provides us with a different configuration 
of four cell choices in a 3 x 3 table. If we 
specify one of these configurations , the d^ for 



the remaining five cells can be stated in terms of 
our four initial d^j t For instance, suppose we are 

given dj^, d 22 , d,, and d* x (see Figure 14). Using 

the row and column sum equations we obtain the re- 
maining dy in terms of these four: 



e 




d 21 * “ d ll " d 3i 
d 12 “ -d 22 + d *l + d 33 
d 13 “ " d ll + d 22 " d 3l " d 33 
d 23 " d ll ~ d 22 + d 3i 




Figure 14 
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The Interpretation o f_Markov_P£Q£esS££. 
glassifications 



as Cross- 



4 Markov chain Cor process) is generally defined 
by its probability transition “| x8#ple of 

airansiUon^atrix of a ^tate^arkov chain^it 

I h °; 8 t Uae P kU S ivtn y that event B, occurred at time 
0 . . , and column indices respect! v( 
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Figure 15 

This ^^^^“it^rcwts-classific^ion"' *It is, 
however*°a table of condi ^^^^if^lross- 
whereas for the c< ?®?J;t t ® 1 , 0 bability table is needed. 

classification a joint probability^ ^ deduced from 
A conditional probability ^ table but a joint 

an underlying joint PjJ^he 1 deduced from either or 
probability table b ® “^ional probability 

both of its corresponding conditionai^p be 

tables. A joint P^ ob * bi ^Hitionll tables and one 
deduced from °“ °Li5?i5tv distributions. In a 
of its m ® r f ina ]L p f ?it ia^probabil i ty distribution 
Markov chain, an i^tiaip ^ thig rov i8 0 ne 

vector (hereafter FD; 8 sets in the under- 

of the two marginal probability ^ initial P DV 

lying joint probability nee ded for a cross- 

completes the cation of a Markov chain, 

classification ifjterpr-t; w ith their assoc- 

We will often subscriit the DV wiU be PD v k . 

iated time index, e.g., the ^ x 
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Continuing with the example in Figure 15 » 
suppose^ we are given a PDV of (.1, .2, *59 .4) 
for events E^ to E^. Then the joint probability 

table becomes that shown in Figure 16 • 
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.26 
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Time k+1 







Figure 16 

The joint probabilities in the body of the table 
are obtained from the conditional probabilities in 
Figure 15 by multiplying the probabilities in each 
row by the corresponding probabilities of PDV^. 

The event types are events observed at the two 
times, k and k+1 while the event classes are E^ to 

E^ for each type* 

The present discussion tends to be in terms of 
the transformation of PDV*s in contrast to the usual 
emphasis on probabilities of going from one state 
to another in a certain number of transitions (e.g., 
from E^ to E^ during time k to k+m). Thus the “ 

usual approach is based on powers of the transition 
matrix whereas this is based on sequences of PDV*s* 

A large portion of the problems couched in terms of 
Markov chain theory are problems which assume that 
the process is ergodic* A Markov process is 



ergodic* if it converges to a steady state as k 
gets larger. A Markov process is in a steady state 
at time k only if H>V k * PDV k+1 . For practical 

purposes we could say that a process is in Jk® 
neighborhood of a steady state at time k+1 if PDV k 

differs from PDV k+1 by less than a specified small 

amount. This neighborhood might be defined in terms 
of the accu r acy of the estimates of the probabilities 

in the PDV's. 



The bottom marginal (or PDV) in Figure 16 could 
be obtained by multiplying together the right 
marginal probabilities and the transition matrix 
of Figure 15 thuss 

(.1, .2, .5, .4) .6 .4 o 0 

0 .6 .4 0 , _ 

* (.14, .16, .26, .44) 

0 0 .6 .4 

.2 0 0 .8 



Such a multiplication indicates the transformation 
of the PDV between time k and k+1. We can obtain 
PDV k+2 by multiplying the result by the same 

transition matrix. In repeated multiplication of 
the previous result, the difference between successive 
PDV’s is seen to diminish; the process is approaching 
the steady state. The process is in a steady state 
if the input PDV is the same as the output PDV, i.e. 



(x, , x of x^i) 



.6 

0 



.4 

.6 



0 

.4 



0 

0 



(X,, X~, X, f Xy.) 



0 0 .6 .4 

.2 0 0 .8 

^ErgoHic" is defined By Feller (11) * P* 553* 
different definition is given by Kemeny & Snell (12), 
p. 99; they include periodic or cyclic Markov chains 
in ergodic chains, Feller does not. 
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Figure 17 is the joint probability table 
steady state of the process, the transit 
of which is shown in Figure 15* 
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Figure 18 is the corresponding 2-measure table* 
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Figure 18 

As an ergodic Markov process progresses toward 
the steady state, the transition matrix remains 

Invari^t! The following items generally change as 
the time index (k) changes: 

1. Both sets of marginal probabilities (the PDV's) 

2. The joint probability table 
J. The 2-measure 
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4* The conditional probability table which reverses 
the effect of the transition matrix, i,e,, that 
which would take PDV k+1 to PDV^. 

The table in (4.) is that obtained by dividing the 
joint probabilities associated with the k to k+1 
transition by the marginal probabilities in the margin 
below. The transformation from PDVj^ to PDV^ 

could also be accomplished with the inverse of the 
transition matrix if the transition matrix is non- 
singular; such an inverse is invariant with respect 
to k, 

A special kind of Markov chain which is an 
exception to the above characterization is that in 
which the transition to each state is independent 
of the state from which the transition began. Then 
the following conditions prevails 

1, All rows of the transition matrix are the same 

2. The joint probabilities are products of their 
marginals 

3* All Z-measures are zero 

4, Given any initial PDV the convergence to the 
steady state is immediate and complete at time 
k+1 and all further transitions do not change 
the PDV • s • 

This special kind of Markov process suggests 
a potential basis for the comparison or character- 
ization of Markov processes: the rapidity of convergence 
to a neighborhood of the steady state. Apparently 
the farther the Z-measures of the joint probability 
table of the steady state are from zero, the slower 
is the convergence. Notice, however, that zeroes 
and ones in the transition matrix indicate cells 
whose Z-measures are invariant during convergence 
so that such cells should be treated differently 
(perhaps excluded) from the cells which change 
during convergence. The theory needs further 
development in this area. 
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The steady state joint probability table and 
Z-measures are fixed for a given transition matrix 
and are independent of the initial PDV. Thus they 
are obvious choices for characterizing a Markov 
process* An infinite number of transition matrices 
converge to each steady state PDV; they all have 
different joint probability tables and different 
Z-measures* 



If one is altering a Markov process in order to 
have it converge to a target steady state one method 
would be the comparison of the Z-measure matrix of 
the process at present with that of the target steady 
state. This will show which transitions must 
become more likely and which ones must become less 
likely in order to attain the target. It may well 
be* however* that the transition matrix which 
maintains the steady state PDV does not matter, 
i.e., only the PDV itself matters. Then many 
alternative Z-measure configurations could be 
compared to see which would be the preferred target 
based perhaps on time, cost, and other criteria. 

The Markov process defined in Figures 15 
has a 4 x 4 joint probability table so it has 
3 x 3 « 9 degrees of freedom. There are eight 
zeroes in the joint table which remain there 
throughout all transitions; these lead to corres- 
ponding fixed Z-measures of minus one. By choosing 
any one of the non-negative Z*s one can determine 
the entire process. This is a special (flow 
process) case and illustrates the method of reverse 
inference discussed in the previous section c 
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